AITopics | Hayward

Collaborating Authors

Hayward

Spectral-Transport Stability and Benign Overfitting in Interpolating Learning

Fredriksson-Imanov, Gustav Olaf Yunus Laitinen-Lundström

arXiv.org Machine LearningApr-13-2026

We develop a theoretical framework for generalization in the interpolating regime of statistical learning. The central question is why highly overparameterized estimators can attain zero empirical risk while still achieving nontrivial predictive accuracy, and how to characterize the boundary between benign and destructive overfitting. We introduce a spectral-transport stability framework in which excess risk is controlled jointly by the spectral geometry of the data distribution, the sensitivity of the learning rule under single-sample replacement, and the alignment structure of label noise. This leads to a scale-dependent Fredriksson index that combines effective dimension, transport stability, and noise alignment into a single complexity parameter for interpolating estimators. We prove finite-sample risk bounds, establish a sharp benign-overfitting criterion through the vanishing of the index along admissible spectral scales, and derive explicit phase-transition rates under polynomial spectral decay. For a model-specific specialization, we obtain an explicit theorem for polynomial-spectrum linear interpolation, together with a proof of the resulting rate. The framework also clarifies implicit regularization by showing how optimization dynamics can select interpolating solutions of minimal spectral-transport energy. These results connect algorithmic stability, double descent, benign overfitting, operator-theoretic learning theory, and implicit bias within a unified structural account of modern interpolation.

artificial intelligence, interpolation, machine learning, (18 more...)

arXiv.org Machine Learning

2604.08625

Country:

North America > United States > New York (0.05)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.05)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
(6 more...)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.34)

Add feedback

Energy Score-Guided Neural Gaussian Mixture Model for Predictive Uncertainty Quantification

Yang, Yang, Ji, Chunlin, Li, Haoyang, Deng, Ke

arXiv.org Machine LearningMar-31-2026

Quantifying predictive uncertainty is essential for real world machine learning applications, especially in scenarios requiring reliable and interpretable predictions. Many common parametric approaches rely on neural networks to estimate distribution parameters by optimizing the negative log likelihood. However, these methods often encounter challenges like training instability and mode collapse, leading to poor estimates of the mean and variance of the target output distribution. In this work, we propose the Neural Energy Gaussian Mixture Model (NE-GMM), a novel framework that integrates Gaussian Mixture Model (GMM) with Energy Score (ES) to enhance predictive uncertainty quantification. NE-GMM leverages the flexibility of GMM to capture complex multimodal distributions and leverages the robustness of ES to ensure well calibrated predictions in diverse scenarios. We theoretically prove that the hybrid loss function satisfies the properties of a strictly proper scoring rule, ensuring alignment with the true data distribution, and establish generalization error bounds, demonstrating that the model's empirical performance closely aligns with its expected performance on unseen data. Extensive experiments on both synthetic and real world datasets demonstrate the superiority of NE-GMM in terms of both predictive accuracy and uncertainty quantification.

artificial intelligence, deep learning, machine learning, (15 more...)

arXiv.org Machine Learning

2603.27672

Country:

North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)
North America > United States > California > Alameda County > Hayward (0.04)
Asia > China > Tianjin Province > Tianjin (0.04)
(3 more...)

Genre: Research Report > New Finding (0.46)

Industry: Banking & Finance > Trading (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

Exponential Family Estimation via Adversarial Dynamics Embedding

Bo Dai, Zhen Liu, Hanjun Dai, Niao He, Arthur Gretton, Le Song, Dale Schuurmans

Neural Information Processing SystemsFeb-12-2026, 15:30:13 GMT

Theorem 1 (Fencheldualoflog-partition (Wainwrightand Jordan,2008)) Let H(q): = R q(x) logq(x)dx. The C. Compared optimization Goodfello, 2014; Arjovsk, 2017; Dai, 2017), thereversalmin-maxin (20), themajor sharesparameters updatesofthe accelerating learnedadv empirically 5. Similaroptimization(13) with (17).

artificial intelligence, machine learning, neural information processing system, (13 more...)

Neural Information Processing Systems

Country:

Asia > Middle East > Jordan (0.25)
North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > Rhode Island > Providence County > Providence (0.04)
(6 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

60495b4e033e9f60b32a6607b587aadd-Paper.pdf

Neural Information Processing SystemsFeb-8-2026, 14:55:15 GMT

Furthermore, weprove information theoretic lower bounds which establish minimax optimality of the skillparameter estimation technique usedinouralgorithm. These bounds utilize a continuum version of Fano's method along with a careful covering argument.

artificial intelligence, estimation, machine learning, (19 more...)

Neural Information Processing Systems

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
Europe > United Kingdom > England (0.05)
Oceania > New Zealand (0.04)
(10 more...)

Industry: Leisure & Entertainment > Sports > Soccer (0.47)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (0.69)

Add feedback

Finite-TimeAnalysisofRound-Robin Kullback-LeiblerUpperConfidenceBoundsfor OptimalAdaptiveAllocationwithMultiplePlaysand MarkovianRewards

Neural Information Processing SystemsFeb-8-2026, 12:46:35 GMT

Forouranalysis wedevise several concentration results forMarkovchains, including amaximal inequality for Markov chains, that may be of interest in their own right. As a byproduct of our analysis we also establish asymptotically optimal, finite-time guarantees for the case of multiple plays, and i.i.d.

artificial intelligence, data mining, machine learning, (20 more...)

Neural Information Processing Systems

Country:

Europe > Hungary > Budapest > Budapest (0.04)
North America > United States > New Jersey > Hudson County > Hoboken (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
(3 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.71)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.36)
Information Technology > Data Science > Data Mining > Big Data (0.31)

Add feedback

It's all In the (Exponential) Family: An Equivalence between Maximum Likelihood Estimation and Control Variates for Sketching Algorithms

Kang, Keegan, Wang, Kerong, Zhang, Ding, Pratap, Rameshwar, Verma, Bhisham Dev, Wong, Benedict H. W.

arXiv.org Machine LearningFeb-5-2026

Maximum likelihood estimators (MLE) and control variate estimators (CVE) have been used in conjunction with known information across sketching algorithms and applications in machine learning. We prove that under certain conditions in an exponential family, an optimal CVE will achieve the same asymptotic variance as the MLE, giving an Expectation-Maximization (EM) algorithm for the MLE. Experiments show the EM algorithm is faster and numerically stable compared to other root finding algorithms for the MLE for the bivariate Normal distribution, and we expect this to hold across distributions satisfying these conditions. We show how the EM algorithm leads to reproducibility for algorithms using MLE / CVE, and demonstrate how the EM algorithm leads to finding the MLE when the CV weights are known.

artificial intelligence, machine learning, variance, (18 more...)

arXiv.org Machine Learning

2601.22378

Country:

Asia > China > Hong Kong (0.04)
Asia > Afghanistan > Parwan Province > Charikar (0.04)
North America > United States > Virginia (0.04)
(5 more...)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.95)

Add feedback

Bayesian Empirical Bayes: Simultaneous Inference from Probabilistic Symmetries

Wu, Bohan, Weinstein, Eli N., Blei, David M.

arXiv.org Machine LearningDec-24-2025

Empirical Bayes (EB) improves the accuracy of simultaneous inference "by learning from the experience of others" (Efron, 2012). Classical EB theory focuses on latent variables that are iid draws from a fitted prior (Efron, 2019). Modern applications, however, feature complex structure, like arrays, spatial processes, or covariates. How can we apply EB ideas to these settings? We propose a generalized approach to empirical Bayes based on the notion of probabilistic symmetry. Our method pairs a simultaneous inference problem-with an unknown prior-to a symmetry assumption on the joint distribution of the latent variables. Each symmetry implies an ergodic decomposition, which we use to derive a corresponding empirical Bayes method. We call this methodBayesian empirical Bayes (BEB). We show how BEB recovers the classical methods of empirical Bayes, which implicitly assume exchangeability. We then use it to extend EB to other probabilistic symmetries: (i) EB matrix recovery for arrays and graphs; (ii) covariate-assisted EB for conditional data; (iii) EB spatial regression under shift invariance. We develop scalable algorithms based on variational inference and neural networks. In simulations, BEB outperforms existing approaches to denoising arrays and spatial data. On real data, we demonstrate BEB by denoising a cancer gene-expression matrix and analyzing spatial air-quality data from New York City.

empirical bayes, statistics, symmetry, (14 more...)

arXiv.org Machine Learning

2512.16239

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.14)
North America > United States > California > Alameda County > Hayward (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
(8 more...)

Genre: Research Report > New Finding (0.46)

Industry: Health & Medicine > Therapeutic Area > Oncology (0.87)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
(2 more...)

Add feedback

Comparing Two Proxy Methods for Causal Identification

Guo, Helen, Ogburn, Elizabeth L., Shpitser, Ilya

arXiv.org Machine LearningDec-9-2025

Identifying causal effects in the presence of unmeasured variables is a fundamental challenge in causal inference, for which proxy variable methods have emerged as a powerful solution. We contrast two major approaches in this framework: (1) bridge equation methods, which leverage solutions to integral equations to recover causal targets, and (2) array decomposition methods, which recover latent factors composing counterfactual quantities by exploiting unique determination of eigenspaces. We compare the model restrictions underlying these two approaches and provide insight into implications of the underlying assumptions, clarifying the scope of applicability for each method.

assumption, geng and tchetgen tchetgen, kuroki and pearl, (13 more...)

arXiv.org Machine Learning

2512.00175

Country:

North America > United States > California > Alameda County > Hayward (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Africa > Senegal > Kolda Region > Kolda (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (0.68)

Add feedback

The Nondecreasing Rank

McCormack, Andrew

arXiv.org Machine LearningSep-3-2025

In this article the notion of the nondecreasing (ND) rank of a matrix or tensor is introduced. A tensor has an ND rank of r if it can be represented as a sum of r outer products of vectors, with each vector satisfying a monotonicity constraint. It is shown that for certain poset orderings finding an ND factorization of rank $r$ is equivalent to finding a nonnegative rank-r factorization of a transformed tensor. However, not every tensor that is monotonic has a finite ND rank. Theory is developed describing the properties of the ND rank, including typical, maximum, and border ND ranks. Highlighted also are the special settings where a matrix or tensor has an ND rank of one or two. As a means of finding low ND rank approximations to a data tensor we introduce a variant of the hierarchical alternating least squares algorithm. Low ND rank factorizations are found and interpreted for two datasets concerning the weight of pigs and a mental health survey during the COVID-19 pandemic.

artificial intelligence, factorization, machine learning, (18 more...)

arXiv.org Machine Learning

2509.00265

Country:

North America > Canada > Alberta (0.14)
North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)
North America > United States > New York (0.04)
(9 more...)

Genre: Research Report (0.63)

Industry: Health & Medicine > Therapeutic Area (0.74)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

On Minimax Estimation of Parameters in Softmax-Contaminated Mixture of Experts

Yan, Fanqi, Nguyen, Huy, Le, Dung, Akbarian, Pedram, Ho, Nhat, Rinaldo, Alessandro

arXiv.org Machine LearningMay-27-2025

The softmax-contaminated mixture of experts (MoE) model is deployed when a large-scale pre-trained model, which plays the role of a fixed expert, is fine-tuned for learning downstream tasks by including a new contamination part, or prompt, functioning as a new, trainable expert. Despite its popularity and relevance, the theoretical properties of the softmax-contaminated MoE have remained unexplored in the literature. In the paper, we study the convergence rates of the maximum likelihood estimator of gating and prompt parameters in order to gain insights into the statistical properties and potential challenges of fine-tuning with a new prompt. We find that the estimability of these parameters is compromised when the prompt acquires overlapping knowledge with the pre-trained model, in the sense that we make precise by formulating a novel analytic notion of distinguishability. Under distinguishability of the pre-trained and prompt models, we derive minimax optimal estimation rates for all the gating and prompt parameters. By contrast, when the distinguishability condition is violated, these estimation rates become significantly slower due to their dependence on the prompt convergence rate to the pre-trained model. Finally, we empirically corroborate our theoretical findings through several numerical experiments.

artificial intelligence, exp, machine learning, (18 more...)

arXiv.org Machine Learning

2505.18455

Country:

Asia > Middle East > Jordan (0.04)
North America > United States > Texas > Travis County > Austin (0.04)
North America > United States > California > Alameda County > Hayward (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.81)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.92)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.48)

Add feedback